AITopics | attention mechanism

This paper investigates the learning theory of Transformer networks for regression tasks on the compact Euclidean domain $[0,1]^d$ and $d$-dimensional compact Riemannian manifolds. We propose a novel constructive approximation framework for Transformers that builds local approximations of the target function and aggregates them into a global approximation via softmax partition of unity. This approach leverages the attention mechanism to achieve spatial localization through affine transformations of the input. The softmax activation plays a crucial role in aggregating local approximations to a global output. From an approximation perspective, we prove that a dense Transformer equipped with only two encoder blocks and standard single-hidden-layer point-wise feed-forward networks can achieve a uniform $\varepsilon$-approximation error for $α$-Hölder continuous functions with $α\in (0,1]$ using $\mathcal{O}(\varepsilon^{-d/α})$ total parameters. Building upon this approximation guarantee, we establish a near minimax-optimal generalization error bound of order $\mathcal{O}\big(n^{-\frac{2α}{2α+d}} \log n\big)$ for the empirical risk minimizer, where $n$ is the training data size. The Transformer architecture studied in this paper is dense, shallow and wide, and employs softmax activation and sinusoidal positional encodings, closely reflecting practical implementations.

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Machine Learning

2605.08811

Genre:

Research Report (1.00)
Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learned Region Sparsity and Diversity Also Predicts Visual Attention

Zijun Wei, Hossein Adeli, Minh Hoai Nguyen, Greg Zelinsky, Dimitris Samaras

Neural Information Processing SystemsMay-1-2026, 06:15:43 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
(2 more...)

Add feedback

Review Networks for Caption Generation

Zhilin Yang, Ye Yuan, Yuexin Wu, William W. Cohen, Russ R. Salakhutdinov

Neural Information Processing SystemsMay-1-2026, 05:55:20 GMT

We propose a novel extension of the encoder-decoder framework, called a review network. The review network is generic and can enhance any existing encoderdecoder model: in this paper, we consider RNN decoders with both CNN and RNN encoders. The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework.

Add feedback

Matching Networks for One Shot Learning

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, koray kavukcuoglu, Daan Wierstra

Neural Information Processing SystemsMay-1-2026, 05:37:24 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Add feedback

14319d9cfc6123106878dc20b94fbaf3-Paper.pdf

Neural Information Processing SystemsMay-1-2026, 01:42:51 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.29)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Communications > Social Media (0.68)
(2 more...)

Add feedback

a408234a9b80604a9cf6ca518e474550-Paper-Conference.pdf

Neural Information Processing SystemsApr-26-2026, 22:15:42 GMT

brain network, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

51f15efdd170e6043fa02a74882f0470-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 22:00:30 GMT

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Supplementary Material for " Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels "

Neural Information Processing SystemsApr-25-2026, 09:21:43 GMT

All experiments were conducted on a Linux server with a Tesla P40 GPU. Our Contrastive Graph Poisson Network (CGPN) was implemented via PyTorch 1.4.0 [1]. We adopted the Adam optimizer [2] for training. The number of GAT layers was set as two. Other hyperparameters were adjusted based on the corresponding datasets. Tables 1, 2, 3, and 4 provide the details of the important hyperparameters.

artificial intelligence, inductive learning, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback

Shape registration in the time of transformers

Neural Information Processing SystemsApr-25-2026, 07:25:05 GMT

In this paper, we propose a transformer-based procedure for the efficient registration of non-rigid 3D point clouds. The proposed approach is data-driven and adopts for the first time the transformer architecture in the registration task. Our method is general and applies to different settings. Given a fixed template with some desired properties (e.g.

machine learning, natural language, registration, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.93)

Technology: